Fast Large-Scale Approximate Graph Construction for NLP
نویسندگان
چکیده
Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.
منابع مشابه
FLAG: Fast Large-Scale Graph Construction for NLP
Many natural language processing (NLP) problems involve constructing large nearest-neighbor graphs between word pairs by computing distributional similarity between word pairs from large corpora. In this paper, first we describe a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data in memory and time efficient manner, FLAG maintains...
متن کاملLarge-scale Nonlinear Programming: An Integrating Framework for Enterprise-Wide Dynamic Optimization
Integration of real-time optimization and control with higher level decision making (scheduling and planning) is an essential goal for profitable operation in a highly competitive environment. While integrated large-scale optimization models have been formulated for this task, their size and complexity remains a challenge to many available optimization solvers. On the other hand, recent develop...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملFast kNN Graph Construction with Locality Sensitive Hashing
The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graphbased learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient a...
متن کاملCONSTRAINED BIG BANG-BIG CRUNCH ALGORITHM FOR OPTIMAL SOLUTION OF LARGE SCALE RESERVOIR OPERATION PROBLEM
A constrained version of the Big Bang-Big Crunch algorithm for the efficient solution of the optimal reservoir operation problems is proposed in this paper. Big Bang-Big Crunch (BB-BC) algorithm is a new meta-heuristic population-based algorithm that relies on one of the theories of the evolution of universe namely, the Big Bang and Big Crunch theory. An improved formulation of the algorithm na...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012